The core data structure of Keras is a model, a way to organize layers. The main type of model is the Sequential model
, a linear stack of layers.
from keras.models import Sequential
model = Sequential()
Stacking layers is as easy as .add()
:
from keras.layers import Dense, Activation
model.add(Dense(output_dim=64, input_dim=100))
model.add(Activation("relu"))
model.add(Dense(output_dim=10))
model.add(Activation("softmax"))
Once your model looks good, configure its learning process with .compile()
:
model.compile(loss='categorical_crossentropy',
optimizer='sgd', metrics=['accuracy'])
If you need to, you can further configure your optimizer.
from keras.optimizers import SGD
model.compile(loss='categorical_crossentropy', optimizer=SGD(lr=0.01, momentum=0.9, nesterov=True))
You can now iterate on your training data in batches:
model.fit(X_train, Y_train, nb_epoch=5, batch_size=32)
Evaluate your performance in one line:
loss_and_metrics = model.evaluate(X_test, Y_test, batch_size=32)
Or generate predictions on new data:
classes = model.predict_classes(X_test, batch_size=32)
proba = model.predict_proba(X_test, batch_size=32)
In [ ]:
'''
Trains a simple deep NN on the MNIST dataset.
You can get to 98.40% test accuracy after 20 epochs.
'''
from __future__ import print_function
import tensorflow as tf
import numpy as np
tf.reset_default_graph()
np.random.seed(1337) # for reproducibility
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation
from keras.optimizers import RMSprop
from keras.utils import np_utils
batch_size = 128
nb_classes = 10
nb_epoch = 10
# the data, shuffled and split between train and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train = X_train.reshape(60000, 784)
X_test = X_test.reshape(10000, 784)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')
# convert class vectors to binary class matrices
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)
model = Sequential()
model.add(Dense(512, input_shape=(784,)))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(10))
model.add(Activation('softmax'))
# print model characteristics
model.summary()
model.compile(loss='categorical_crossentropy',
optimizer=RMSprop(),
metrics=['accuracy'])
history = model.fit(X_train,
Y_train,
batch_size=batch_size,
nb_epoch=nb_epoch,
verbose=1,
validation_data=(X_test, Y_test))
score = model.evaluate(X_test, Y_test, verbose=0)
print('\n')
print('Test score:', score[0])
print('Test accuracy:', score[1])
We are going to train RNN "character-level" language models.
That is, we’ll give the RNN a huge chunk of text and ask it to model the probability distribution of the next character in the sequence given a sequence of previous characters. This will then allow us to generate new text one character at a time.
We will encode each character into a vector using 1-of-k
encoding (i.e. all zero except for a single one at the index of the character in the vocabulary), and feed them into the RNN one at a time.
At test time, we will feed a character into the RNN and get a distribution over what characters are likely to come next. We sample from this distribution, and feed it right back in to get the next letter. Repeat this process and you’re sampling text!
We can also play with the temperature of the Softmax during sampling. Decreasing the temperature from 1 to some lower number (e.g. 0.5) makes the RNN more confident, but also more conservative in its samples. Conversely, higher temperatures will give more diversity but at cost of more mistakes.
In order to process sequences of symbols with RNN we need to represent these symbols by numbers.
Let's suppose we have $|V|$ different symbols. The most simple representation is the one-hot vector: Represent every symbol as an $\mathbb{R}^{|V|\times1}$ vector with all $0$s and one $1$ at the index of that word. Symbol vectors in this type of encoding would appear as the following:
$$w^{s_1} = \left[ \begin{array}{c} 1 \\ 0 \\ 0 \\ \vdots \\ 0 \end{array} \right], w^{s_2} = \left[ \begin{array}{c} 0 \\ 1 \\ 0 \\ \vdots \\ 0 \end{array} \right], w^{s_3} = \left[ \begin{array}{c} 0 \\ 0 \\ 1 \\ \vdots \\ 0 \end{array} \right], \cdots w^{s_{|V|}} = \left[ \begin{array}{c} 0 \\ 0 \\ 0 \\ \vdots \\ 1 \end{array} \right] $$We represent each symbol as a completely independent entity. This symbol representation does not give us directly any notion of similarity.
To train our model we need text to learn from a large dataset of names. Fortunately we don’t need any labels to train a language model, just raw text.
In [ ]:
from __future__ import print_function
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.layers import LSTM
from keras.optimizers import RMSprop
import numpy as np
import random
import sys
import codecs
f = codecs.open('data/NombresMujerBarcelona.txt', "r", "utf-8")
#f = codecs.open('data/toponims.txt', "r", "utf-8")
string = f.read()
string.encode('utf-8')
text = string.lower()
# text = text.replace("\n", " ")
print('corpus length:', len(text))
chars = sorted(list(set(text)))
print('total chars:', len(chars))
char_indices = dict((c, i) for i, c in enumerate(chars))
indices_char = dict((i, c) for i, c in enumerate(chars))
# cut the text in semi-redundant sequences of maxlen characters
maxlen = 20
step = 3
sentences = []
next_chars = []
for i in range(0, len(text) - maxlen, step):
sentences.append(text[i: i + maxlen])
next_chars.append(text[i + maxlen])
print('nb sequences:', len(sentences))
print('Vectorization...')
X = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
y = np.zeros((len(sentences), len(chars)), dtype=np.bool)
for i, sentence in enumerate(sentences):
for t, char in enumerate(sentence):
X[i, t, char_indices[char]] = 1
y[i, char_indices[next_chars[i]]] = 1
Classical neural networks, including convolutional ones, suffer from two severe limitations:
Recurrent neural networks overcome these limitations by allowing to operate over sequences of vectors (in the input, in the output, or both).
Basic RNN architecture:
Unrolling in time of a RNN (By unrolling we mean that we write out the network for the complete sequence):
Training a RNN is similar to training a traditional NN, but some modifications.
The main reason is that parameters are shared by all time steps: in order to compute the gradient at t=4, we need to propagate 3 steps and sum up the gradients.
This is called Backpropagation through time (BPTT).
Vanilla RNNs trained with SGD are unstable/difficult to learn. Bit various tricks make our life easier:
There are two types of gated RNNs:
In [ ]:
# build the model
print('Build model...')
model = Sequential()
model.add(LSTM(64,
dropout=0.2,
recurrent_dropout=0.2,
input_shape=(maxlen, len(chars))))
#model.add(LSTM(64,
# dropout_W=0.2,
# dropout_U=0.2))
model.add(Dense(len(chars)))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Activation('softmax'))
optimizer = RMSprop(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer=optimizer)
The simplest way to use the Keras LSTM model to make predictions is to first start off with a seed sequence as input, generate the next character then update the seed sequence to add the generated character on the end and trim off the first character.
This process is repeated for as long as we want to predict new characters
In [ ]:
def sample(preds, temperature=1.0):
# helper function to sample an index from a probability array
preds = np.asarray(preds).astype('float64')
preds = np.log(preds) / temperature
exp_preds = np.exp(preds)
preds = exp_preds / np.sum(exp_preds)
probas = np.random.multinomial(1, preds, 1)
return np.argmax(probas)
# train the model, output generated text after each iteration
for iteration in range(1, 60):
print()
print('-' * 50)
print('Iteration', iteration)
model.fit(X, y, batch_size=256, epochs=1)
start_index = random.randint(0, len(text) - maxlen - 1)
generated = ''
sentence = text[start_index: start_index + maxlen]
generated += sentence
print('----- Generating with seed: "' + sentence.replace("\n", " ") + '"')
for diversity in [0.5, 1.0]:
print()
print('----- diversity:', diversity)
for i in range(50):
x = np.zeros((1, maxlen, len(chars)))
for t, char in enumerate(sentence):
x[0, t, char_indices[char]] = 1.
preds = model.predict(x, verbose=0)[0]
next_index = sample(preds, diversity)
next_char = indices_char[next_index]
generated += next_char
sentence = sentence[1:] + next_char
sys.stdout.write(next_char)
sys.stdout.flush()
print()